AITopics | planning phase

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Neural Information Processing SystemsApr-24-2026, 15:50:13 GMT

We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state.

Add feedback

OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing SystemsFeb-10-2026, 11:12:45 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.

artificial intelligence, exploration phase, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

Neural Information Processing SystemsFeb-10-2026, 11:12:37 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to computeanear-optimalpolicy.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

0cb929eae7a499e50248a3a78f7acfc7-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 11:22:52 GMT

algorithm, reward function, sample complexity, (12 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

Macroscopic EEG Reveals Discriminative Low-Frequency Oscillations in Plan-to-Grasp Visuomotor Tasks

Cetera, Anna, Ghafoori, Sima, Rabiee, Ali, Farhadi, Mohammad Hassan, Shahriari, Yalda, Abiri, Reza

arXiv.org Artificial IntelligenceOct-23-2025

Abstract--Objective: The vision-based grasping brain network integrates visual perception with cognitive and motor processes for visuomotor tasks. While invasive recordings have successfully decoded localized neural activity related to grasp type planning and execution, macroscopic neural activation patterns captured by noninvasive electroencephalography (EEG) remain far less understood. Methods: We introduce a novel vision-based grasping platform to investigate grasp-type-specific (precision, power, no-grasp) neural activity across large-scale brain networks using EEG neuroimaging. The platform isolates grasp-specific planning from its associated execution phases in naturalistic visuomotor tasks, where the Filter-Bank Common Spatial Pattern (FBCSP) technique was designed to extract discriminative frequency-specific features within each phase. Support vector machine (SVM) classification discriminated binary (precision vs. power, grasp vs. no-grasp) and multiclass (precision vs. power vs. no-grasp) scenarios for each phase, and were compared against traditional Movement-Related Cortical Potential (MRCP) methods. Results: Low-frequency oscillations (0.5-8 Hz) carry grasp-related information established during planning and maintained throughout execution, with consistent classification performance across both phases (75.3-77.8%) Higher-frequency activity (12-40 Hz) showed phase-dependent results with 93.3% accuracy for grasp vs. no-grasp classification but 61.2% for precision vs. power discrimination. Feature importance using SVM coefficients identified discriminative features within frontoparietal networks during planning and motor networks during execution. Conclusion: This work demonstrated the role of low-frequency oscillations in decoding grasp type during planning using noninvasive EEG. Significance: These findings provide a foundation toward scalable, intention-driven Brain-Machine-Interface (BMI) control strategies.

artificial intelligence, machine learning, scenario, (16 more...)

arXiv.org Artificial Intelligence

2510.19057

Country: North America > United States (0.47)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

On Reward-Free Reinforcement Learning with Linear Function Approximation

Neural Information Processing SystemsAug-16-2025, 13:18:40 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function.

algorithm, exploration phase, planning phase, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

On Reward-Free Reinforcement Learning with Linear Function Approximation

Neural Information Processing SystemsAug-16-2025, 13:18:32 GMT

During the exploration phase, an agent collects samples without using a pre-specified reward function.

algorithm, exploration phase, reward function, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

emphasize the technical novelty of our upper bound and lower bound as Reviewer # 1, Reviewer # 3 and Reviewer # 4 2 commented on the technical novelty of our theoretical results

Neural Information Processing SystemsAug-16-2025, 13:18:20 GMT

We thank all the reviewers for their valuable feedback and appreciating our contributions. T echnical novelty of the upper bound. In the exploration phase, Jin et al. [2020] set reward to be To our knowledge, this idea is new in the literature. For example, for the hard instance in [Du et al. 2020], only a single state-action pair has non-zero reward Moreover, we focus on the reward-free setting while Du et al. [2020] focused on the standard RL setting. Below we address specific concerns from each reviewer.

novelty, reviewer, technical novelty, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

Xu, Licong, Sarkar, Milind, Lonappan, Anto I., Zubeldia, Íñigo, Villanueva-Domingo, Pablo, Casas, Santiago, Fidler, Christian, Amancharla, Chetana, Tiwari, Ujjwal, Bayer, Adrian, Ekioui, Chadi Ait, Cranmer, Miles, Dimitrov, Adrian, Fergusson, James, Gandhi, Kahaan, Krippendorf, Sven, Laverick, Andrew, Lesgourgues, Julien, Lewis, Antony, Meier, Thomas, Sherwin, Blake, Surrao, Kristen, Villaescusa-Navarro, Francisco, Wang, Chi, Xu, Xueqing, Bolliet, Boris

arXiv.org Artificial IntelligenceJul-14-2025

We present a multi-agent system for automation of scientific research tasks, cmbagent (https://github.com/CMBAgents/cmbagent). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2507.07257

Country:

North America > United States > California (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)

Genre:

Workflow (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning

Liu, Yongshuai, Liu, Xin

arXiv.org Artificial IntelligenceMar-25-2025

Model-based reinforcement learning (MBRL) has demonstrated superior sample efficiency compared to model-free reinforcement learning (MFRL). However, the presence of inaccurate models can introduce biases during policy learning, resulting in misleading trajectories. The challenge lies in obtaining accurate models due to limited diverse training data, particularly in regions with limited visits (uncertain regions). Existing approaches passively quantify uncertainty after sample generation, failing to actively collect uncertain samples that could enhance state coverage and improve model accuracy. Moreover, MBRL often faces difficulties in making accurate multi-step predictions, thereby impacting overall performance. To address these limitations, we propose a novel framework for uncertainty-aware policy optimization with model-based exploratory planning. In the model-based planning phase, we introduce an uncertainty-aware k-step lookahead planning approach to guide action selection at each step. This process involves a trade-off analysis between model uncertainty and value function approximation error, effectively enhancing policy performance. In the policy optimization phase, we leverage an uncertainty-driven exploratory policy to actively collect diverse training samples, resulting in improved model accuracy and overall performance of the RL agent. Our approach offers flexibility and applicability to tasks with varying state/action spaces and reward structures. We validate its effectiveness through experiments on challenging robotic manipulation tasks and Atari games, surpassing state-of-the-art methods with fewer interactions, thereby leading to significant performance improvements.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2503.20139

Country:

North America > United States > California > Yolo County > Davis (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

planning phase

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

OnReward-FreeReinforcementLearningwith LinearFunctionApproximation

0cb929eae7a499e50248a3a78f7acfc7-Paper.pdf

Macroscopic EEG Reveals Discriminative Low-Frequency Oscillations in Plan-to-Grasp Visuomotor Tasks

On Reward-Free Reinforcement Learning with Linear Function Approximation

On Reward-Free Reinforcement Learning with Linear Function Approximation

emphasize the technical novelty of our upper bound and lower bound as Reviewer # 1, Reviewer # 3 and Reviewer # 4 2 commented on the technical novelty of our theoretical results

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

Look Before Leap: Look-Ahead Planning with Uncertainty in Reinforcement Learning